10 research outputs found

    Automated Georeferencing of Antarctic Species

    Get PDF
    Many text documents in the biological domain contain references to the toponym of specific phenomena (e.g. species sightings) in natural language form "In Garwood Valley summer activity was 0.2% for Umbilicaria aprina and 1.7% for Caloplaca sp. ..." While methods have been developed to extract place names from documents, and attention has been given to the interpretation of spatial prepositions, the ability to connect toponym mentions in text with the phenomena to which they refer (in this case species) has been given limited attention, but would be of considerable benefit for the task of mapping specific phenomena mentioned in text documents. As part of work to create a pipeline to automate georeferencing of species within legacy documents, this paper proposes a method to: (1) recognise species and toponyms within text and (2) match each species mention to the relevant toponym mention. Our methods find significant promise in a bespoke rules- and dictionary-based approach to recognise species within text (F1 scores up to 0.87 including partial matches) but less success, as yet, recognising toponyms using multiple gazetteers combined with an off the shelf natural language processing tool (F1 up to 0.62). Most importantly, we offer a contribution to the relatively nascent area of matching toponym references to the object they locate (in our case species), including cases in which the toponym and species are in different sentences. We use tree-based models to achieve precision as high as 0.88 or an F1 score up to 0.68 depending on the downsampling rate. Initial results out perform previous research on detecting entity relationships that may cross sentence boundaries within biomedical text, and differ from previous work in specifically addressing species mapping

    HILT : High-Level Thesaurus Project M2M Feasibility Study : [Final Report]

    Get PDF
    The project was asked to investigate the feasibility of developing SOAP-based interfaces between JISC IE services and Wordmap APIs and non-Wordmap versions of the HILT pilot demonstrator created under HILT Phase II and to determine the scope and cost of the provision of an actual demonstrator based on each of these approaches. In doing so it was to take into account the possibility of a future Zthes1-based solution using Z39.50 or OAI-PMH and syntax and data-exchange protocol implications of eScience and semantic-web developments. It was agreed that the primary concerns of the study should be an assessment of the feasibility, scope, and cost of a follow-up M2M pilot that considered the best options in respect of: o Query protocols (SOAP, Z39.50, SRW, OAI) and associated data profiles (e.g. Zthes for Z39.50 and for SRW); o Standards for structuring thesauri and thesauri-type information (e.g. the Zthes XML DTD and SRW version of it and SKOS-Core2); The study was carried out within the allotted timescale, with this Final Report submitted to JISC on 31st March 2005 as scheduled. The detailed proposal for a follow-up project is currently under discussion and will be finalised – as agreed with JISC – by mid-April. It was concluded that an M2M pilot was feasible. A proposal for a follow-up M2M pilot project has been scoped, and is currently being costed

    EDINA Digimap: New Developments in the Internet Mapping and Data Service for the UK Higher Education Community

    Get PDF
    Following successful trials in six United Kingdom university map collections, the EDINA Digimap web-based mapping service was launched on 10 January 2000. Digimap gives access to current Ordnance Survey of Great Britain (OS) maps ranging in scale from 1:1,250 to 1:250,000 and also to the raw digital map data. This paper looks at the background to the service, the facilities it offers to the UK Higher Education (HE) community, and future plans for incorporating other data, including historic mapping and aerial photography

    Challenges in supporting extraction of knowledge about environmental objects and events from geosensor data

    Full text link
    Technologies for capturing large amounts of real-time and high-detail data about the environment have advanced rapidly; our ability to use this data for understanding the monitored settings for decision-making has not. Visual analytics, creating suitable tools and interfaces that combine computational powers with the human’s capabilities for visual sense making, is a promising approach. Geosensor networks monitor a range of different complex environmental settings, collecting heterogeneous data at different spatial and temporal scales. Similarly domain experts with specific preferences and requirements use the collected data. Additionally, long-term monitoring networks may aim to increase sensor node longevity by minimizing storage and communication load. Based on these aspects, four key challenges for the extraction of knowledge about environmental objects and events from geosensor data are identified: dynamics and uncertainty of the continuous stream of recorded data; different scales in data collection but also data analysis at a range of aggregation levels; decentralized data processing and storage; and evaluation of the effectiveness, efficiency and completeness of implemented decentralized visual analytics approaches
    corecore